Outliers are frequently found in data sets and can cause problems for researchers if not addressed. Failure to identify and deal with outliers in an appropriate manner may lead researchers to report erroneous results. Using a multiple regression context, this paper examines some of the reasons for the presence of outliers and simple methods for identifying them. Heuristic data sets and scatterplots provide illustrations of the concepts discussed.(Contains 2 figures, 2 tables, and 11...

Topics: ERIC Archive, Heuristics, Regression (Statistics), Vannoy, Martha

Although the concept of the general linear model (GLM) has existed since the 1960s, other univariate analyses such as the t-test and the analysis of variance models have remained popular. The GLM produces an equation that minimizes the mean differences of independent variables as they are related to a dependent variable. From a computer printout of a regression analysis, the researcher can obtain weights that apply to each variable and then construct this equation. Certain univariate analyses...

Topics: ERIC Archive, Correlation, Equations (Mathematics), Regression (Statistics), Scaling, Vidal, Sherry

This paper considers the use of commonality analysis as an effective tool for analyzing relationships between variables in multiple regression or canonical correlational analysis (CCA). The merits of commonality analysis are discussed and the procedure for running commonality analysis is summarized as a four-step process. A heuristic example is offered as a demonstration of the use of commonality analysis, and the potential limitations and advantages of commonality analysis are discussed. An...

Topics: ERIC Archive, Correlation, Regression (Statistics), Kroff, Michael W.

All parametric statistical analyses have certain assumptions about the data that must be met reasonably to warrant the use of a given analysis. Distributional normality, for example, is a common assumption. There is a variety of ways that data in a distribution may detract from normality, but one common problem is the presence of outliers. Many applied regression researchers, however, are unfamiliar with the potential role and process of robust regression procedures. Robust regression methods...

Topics: ERIC Archive, Estimation (Mathematics), Regression (Statistics), Robustness (Statistics), Lane, Ken

Among the computer-based methods used for the construction of trees such as AID, THAID, CART, and FACT, the only one that uses an algorithm that first grows a tree and then prunes the tree is CART. The pruning component of CART is analogous in spirit to the backward elimination approach in regression analysis. This idea provides a tool in controlling the tree sizes to some extent and thus estimating the prediction error by the tree within a certain range of tree size. In the CART pruning...

Topics: ERIC Archive, Algorithms, Decision Making, Equations (Mathematics), Prediction, Regression...

Commonality analysis is a method of decomposing the R squared in a multiple regression analysis into the proportion of explained variance of the dependent variable associated with each independent variable uniquely and the proportion of explained variance associated with the common effects of one or more independent variables in various combinations. Unlike other variance partitioning methods (e.g., stepwise regression) that distort the results, commonality analysis considers all possible...

Topics: ERIC Archive, Heuristics, Prediction, Regression (Statistics), Amado, Alfred J.

This paper presents an overview of logistic regression and illustrates the method with the data transformations that are conducted. It also discusses the interpretation of logistic regression results. To make the discussion more concrete, an analysis of a data set is presented in which logistic regression is used to predict the likelihood of a college student's withdrawing or failing a course. Logistic regression is a well-suited analysis technique when a dichotomous dependent variable is...

Topics: ERIC Archive, Predictor Variables, Regression (Statistics), Brooks, B. Meade

Multiple regression analysis is used with considerable frequency by researchers as a means of predicting the impact of predictor variables on a dependent variable. Regression predictors are typically correlated, often intentionally. To better understand the relative contribution of each independent variable in regression (and other) analyses, researchers can partition the squared multiple correlation (R squared) into constituent portions that can be attributed to the independent variables both...

Topics: ERIC Archive, Correlation, Predictor Variables, Regression (Statistics), Cool, Angela L.

Outliers are extreme data points that have the potential to influence statistical analyses. Outlier identification is important to researchers using regression analysis because outliers can influence the model used to such an extent that they seriously distort the conclusions drawn from the data. The effects of outliers on regression analysis are discussed, and examples of various detection methods are given. Most outlier detection methods involve the calculation of residuals. Given that the...

Topics: ERIC Archive, Identification, Regression (Statistics), Evans, Victoria P.

This paper discusses the importance of interpreting both regression coefficients and structure coefficients when analyzing the results of multiple regression analysis, particularly with correlated predictor variables. The concepts of multicolinearity and suppressor effects are introduced, along with examples from the previously published articles that demonstrate how erroneous conclusions are drawn when researchers fail to consult both beta weights and structure coefficients (or both beta...

Topics: ERIC Archive, Predictor Variables, Regression (Statistics), Burdenski, Thomas K., Jr.

One of the innovative approaches in the use of hierarchical linear models (HLM) is to use HLM for Slopes as Outcomes models. This implies that the researcher considers that the regression slopes vary from cluster to cluster randomly as well as systematically with certain covariates at the cluster level. Among the covariates, group indicator variables at the cluster level, which classify the cluster units into several groups, are often found to be significant predictors. If this is the case, the...

Topics: ERIC Archive, Mathematical Models, Prediction, Regression (Statistics), Miyazaki, Yasuo

Logistic regression was used to develop appropriate weights for an academic admission index. A combined sample of 3-year freshman cohorts (fall 1996 through fall 1998) was used to develop the index. The weights in several logistic regression analyses for high school class percentile and ACT composite score predicting different college outcomes were taken into consideration to compose a simplified academic admission index. The effectiveness of the index was examined by several outcome measures...

Topics: ERIC Archive, College Admission, College Freshmen, Decision Making, Higher Education, Regression...

A regression procedure is developed to link simultaneously a very large number of item response theory (IRT) parameter estimates obtained from a large number of test forms, where each form has been separately calibrated and where forms can be linked on a pairwise basis by means of common items. An application is made to forms in which a two-parameter logistic model is applied to dichotomous items and a general partial credit model is applied to polytomous items.

Topics: ERIC Archive, Regression (Statistics), Item Response Theory, Models, Equated Scores, Haberman,...

Multiple regression is commonly used in social and behavioral data analysis. In multiple regression contexts, researchers are very often interested in determining the "best" predictors in the analysis. This focus may stem from a need to identify those predictors that are supportive of theory. Alternatively, the researcher may simply be interested in explaining the most variability in the dependent variable with the fewest possible predictors, perhaps as part of a cost analysis. Two...

Topics: ERIC Archive, Multiple Regression Analysis, Predictor Variables, Regression (Statistics),...

This presentation discusses the use of a time series approach to the analysis of daily attendance in two urban high schools over the course of one school year (2009-10). After establishing that the series for both schools were stationary, they were examined for moving average processes, autoregression, seasonal dependencies (weekly cycles), outliers and heteroscedasticity. Seasonal dependencies were significant in both schools. In addition, contrary to what the traditional attendance statistics...

Topics: ERIC Archive, Attendance, High Schools, Urban Schools, Data Analysis, Regression (Statistics),...

The California Postsecondary Education Commission periodically conducts studies of university eligibility of public high school graduates. The eligibility rates from these studies are the proportion of high school graduates who qualify for freshman admission to California public universities. Eligibility is based on completion of specific high school courses, standardized test scores, and grade-point average. Eligibility rates are higher than the percent of students who actually enter the...

Topics: ERIC Archive, High Schools, Standardized Tests, Eligibility, High School Graduates, Predictor...

Researchers in education and the social sciences make extensive use of linear regression models in which the dependent variable is continuous-valued while the explanatory variables are a combination of continuous-valued regressors and dummy variables. The dummies partition the sample into groups, some of which may contain only a few observations. Such groups may easily include enough outliers to break down the parameter estimates. Models with many fixed or random effects appear to be especially...

Topics: ERIC Archive, Social Sciences, Regression (Statistics), Computation, Models, Predictor Variables,...

Gross Domestic Product of any country often influence economic decisions by policy makers, market participants and econometricians on policy recommendations, evaluation and forecasting. However these decisions are often based on preliminary data announcements by statistical agencies. It is therefore important to ensure that the preliminary GDP announcements are efficient and can be relied on. This paper focuses on South Africa's preliminary announcements of quarterly GDP estimates by examining...

Topics: ERIC Archive, Foreign Countries, Economic Factors, Computation, Data, Least Squares Statistics,...

Least squares methods are sophisticated mathematical curve fitting procedures used in all classical parametric methods. The linear least squares approximation is most often associated with finding the "line of best fit" or the regression line. Since all statistical analyses are correlational and all classical parametric methods are least square procedures, it becomes imperative to understand just what the least squares procedure is and how it works. This paper illustrates the least...

Topics: ERIC Archive, Goodness of Fit, Least Squares Statistics, Matrices, Regression (Statistics)

The effect of a nonlinear regression term on the behavior of the standard analysis of covariance (ANCOVA) F test was investigated for balanced and randomized designs through a Monte Carlo study. The results indicate that the use of the standard analysis of covariance model when a quadratic term is present has little effect on Type I error rates but produces a substantial power loss compared to theoretically expected values, often in excess of 20%. The extent of the power loss depends on the...

Topics: ERIC Archive, Analysis of Covariance, Monte Carlo Methods, Power (Statistics), Regression...

Presented at the Annual Meeting of the American Educational Research Association (AERA) in April 2009. Compares results of different approaches to propensity-score matching with hierarchical data.

Topics: ERIC Archive, Comparative Analysis, Statistical Analysis, Computation, Probability, Regression...

This conference presentation reviews the authors' work on autocorrelations in single-case designs. The bias-corrected autocorrelation is computed, results are meta-analyzed with 5-level multilevel analysis in SAS Proc Mixed. Results suggest autocorrelations are normally distributed, and that taking into account nesting in outcomes and articles accounts for a large amount of variance in the autocorrelation.

Topics: ERIC Archive, Correlation, Research Design, Meta Analysis, Statistical Distributions, Hierarchical...

The increased use of multiple regression analysis in research warrants closer examination of the coefficients produced in these analyses, especially ones which are often ignored, such as structure coefficients. Structure coefficients are bivariate correlation coefficients between a predictor variable and the synthetic variable. When predictor variables are correlated with each other, regression results may be seriously distorted by failure to interpret structure coefficients. Structure...

Topics: ERIC Archive, Correlation, Factor Analysis, Predictor Variables, Regression (Statistics), Research...

Using a hypothetical data set of 24 cases concerning opinions on contemporary issues on which Democrats and Republicans might disagree, concrete examples are provided to illustrate that canonical correlation analysis is the most general linear model, subsuming other parametric procedures as special cases. Specific statistical techniques included in the analysis are "t"-tests, Pearson correlation, multiple regression, analysis of variance, multiple analysis of variance, and...

Topics: ERIC Archive, Analysis of Variance, Correlation, Discriminant Analysis, Heuristics, Multivariate...

The paper stresses the importance of consulting beta weights and structure coefficients in the interpretation of regression results. The effects of multilinearity and suppressors and their effects on interpretation of beta weights are discussed. It is concluded that interpretations based on beta weights only can lead the unwary researcher to inaccurate conclusions. Despite warnings, though, researchers are still using only beta weights in the interpretation of regression analyses. A review of...

Topics: ERIC Archive, Regression (Statistics), Research Methodology, Research Reports, School...

Using a longitudinal data set obtained from 169 pre-adolescent children between the ages of 8 and 13 years, this study statistically divided locus of control into two independent components. The first component was noted as "age-dependent" (AD) and was determined by predicted values generated by regressing children's ages onto their locus of control scores. The second component was called "age-independent" (AI) and was determined by the residual scores from the regression...

Topics: ERIC Archive, Age Differences, Locus of Control, Longitudinal Studies, Preadolescents, Predictor...

Homoscedasticity is an important assumption of linear regression. This paper explains what it is and why it is important to the researcher. Graphical and mathematical methods for testing the homoscedasticity assumption are demonstrated. Sources of homoscedasticity and types of homoscedasticity are discussed, and methods for correction are demonstrated. Graphs are used to illustrate different patterns that may be caused by heteroscedasticity. An extensive example for using Weighted Least Squares...

Topics: ERIC Archive, Graphs, Least Squares Statistics, Regression (Statistics), Thompson, Russel L.

This paper empirically and systematically assessed the performance of bootstrap resampling procedure as it was applied to a regression model. Parameter estimates from Monte Carlo experiments (repeated sampling from population) and bootstrap experiments (repeated resampling from one original bootstrap sample) were generated and compared. Sample sizes of 20, 30, 50, and 100 were considered in the simulation. Ten independent Monte Carlo experiments and 10 independent bootstrap experiments were...

Topics: ERIC Archive, Estimation (Mathematics), Monte Carlo Methods, Regression (Statistics), Sample Size,...

This study examined the reliability of three methods for detecting differential item functioning (DIF) (i.e., the Mantel-Haenszel method, the standardization method, and the logistic regression method) applied to achievement test data. In addition, the study examined the influences of different sources of error variance, including examinee, occasion, and curriculum sampling on the magnitude of the reliability of the different DIF detection methods. Three datasets were assembled from the 1992...

Topics: ERIC Archive, Identification, Item Bias, Regression (Statistics), Reliability, Sampling, Tables...

The Johnson-Neyman (J-N) technique (P. Johnson and N. Neyman, 1936) is used to determine areas of significant difference in a criterion variable between two or more groups in situations of linear regression. In using this technique, researchers have encountered difficulties with results, possibly related to the J-N technique's sensitivity to violations of certain assumptions and conditions. For this study, Monte Carlo simulations were performed to determine the effect that sample size and...

Topics: ERIC Archive, Monte Carlo Methods, Regression (Statistics), Sample Size, Simulation, Wind, Brian...

This study applies a procedure which yields estimates of true score change on the Scholastic Aptitude Test (SAT) adjusted for regression effects and student self-selection. It is shown that student self-selection in deciding to repeat an admissions test probably involves factors in addition to the measurement error attributable to variations in aspects of test specifications and to variations in responses of test candidates across forms, and that estimated true score change remains nearly...

Topics: ERIC Archive, College Entrance Examinations, Error of Measurement, Regression (Statistics), Scores,...

The information that is gained through various analyses of the residual scores yielded by the least squares regression model is explored. In fact, the most widely used methods for detecting data that do not fit this model are based on an analysis of residual scores. First, graphical methods of residual analysis are discussed, followed by a review of several quantitative approaches. Only the more widely used approaches are discussed. Example data sets are analyzed through the use of the...

Topics: ERIC Archive, Graphs, Identification, Least Squares Statistics, Regression (Statistics), Research...

Multiple regression is a useful statistical technique when the researcher is considering situations in which variables of interest are theorized to be multiply caused. It may also be useful in those situations in which the researchers is interested in studies of predictability of phenomena of interest. This paper provides an introduction to regression analysis, focusing on five major questions a novice user might ask. The presentation is set in the framework of the general linear model and...

Topics: ERIC Archive, Data Analysis, Regression (Statistics), Daniel, Larry G., Onwuegbuzie, Anthony J.

The assumption that is most important to the hypothesis testing procedure of multiple linear regression is the assumption that the residuals are normally distributed, but this assumption is not always tenable given the realities of some data sets. When normal distribution of the residuals is not met, an alternative method can be initiated. As an alternative, data for one or more of the variables under study can be transformed in order to increase conformity to the required distributional...

Topics: ERIC Archive, Hypothesis Testing, Regression (Statistics), Statistical Distributions,...

Nine statistical strategies for selecting equating functions in an equivalent groups design were evaluated. The strategies of interest were likelihood ratio chi-square tests, regression tests, Kolmogorov-Smirnov tests, and significance tests for equated score differences. The most accurate strategies in the study were the likelihood ratio tests and the significance tests for equated score differences.

Topics: ERIC Archive, Equated Scores, Statistical Analysis, Statistical Significance, Regression...

Now that the school effectiveness field is maturing, more refined and contextually sensitive observations about schools are possible. This paper focuses on socioeconomic status (SES) as one social context variable demonstrating substantial predictive power in numerous school improvement studies. Instead of viewing middle class behavior as superior and lower class behavior as deficient, this paper explores how human variation may be exploited for the enrichment of all members of society. To this...

Topics: ERIC Archive, Behavior Standards, Elementary Education, Middle Class, Regression (Statistics),...

Discrete Choice Marketing (DCM), a research technique that has become more popular in recent marketing research, is described. DCM is a method that forces people to look at the combination of relevant variables within each choice domain and, with each option fully defined in terms of the values for those variables, make a choice of options. DCM provides more reliable and valid results than do its more simple survey relatives because it more closely resembles the environment in which people...

Topics: ERIC Archive, Marketing, Regression (Statistics), Research, Scoring, Surveys, Berdie, Doug R.

This paper examines the differences between multilevel modeling and weighted ordinary least squares (OLS) regression for analyzing data from the National Educational Longitudinal Study 1988 (NELS:88). The final sample consisted of 718 students in 298 schools. Eighteen variables from the NELS:88 dataset were used, with the dependent variable being the science item response theory estimated number right standardized t-score. Results from the analyses yield no single criterion for choosing one...

Topics: ERIC Archive, Least Squares Statistics, National Surveys, Regression (Statistics), Research Design,...

The problem with "classical" statistics all invoking the mean is that these estimates are notoriously influenced by atypical scores (outliers), partly because the mean itself is differentially influenced by outliers. In theory, "modern" statistics may generate more replicable characterizations of data, because at least in some respects the influence of more extreme scores, which are less likely to be drawn in future samples from the tails of a non-uniform (non-rectangular or...

Topics: ERIC Archive, Statistics, Statistical Analysis, Regression (Statistics), Monte Carlo Methods,...

Many researchers are unfamiliar with suppressor variables and how they operate in multiple regression analyses. This paper describes the role suppressor variables play in a multiple regression model and provides practical examples that explain how they can change research results. A variable that when added as another predictor increases the total correlation coefficient squared (R squared) is a suppressor variable. Suppressor variables measure invalid variance in the predictor measures and...

Topics: ERIC Archive, Correlation, Predictor Variables, Regression (Statistics), Suppressor Variables,...

When survey data are statistically analyzed, many times some of the data is missing. If the missing values are not correctly handled, results of the analysis may be dubious and publication may jeopardize the credibility of the organization preparing the report. This study examined four of the more commonly used methods of handling missing data. The following techniques were compared: (1) listwise deletion; (2) pairwise deletion; (3) mean substitution; and (4) regression imputation of missing...

Topics: ERIC Archive, Comparative Analysis, Evaluation Methods, Predictive Measurement, Regression...

A modification of the usual graphical representation of heterogeneous regressions is described that can aid in interpreting significant regions for linear or quadratic surfaces. The standard Johnson-Neyman graph is a bivariate plot with the criterion variable on the ordinate and the predictor variable on the abscissa. Regression surfaces are drawn for each group. If there are regions of significance, their boundaries are noted either on the graph or in the text. If there is a manageable number...

Topics: ERIC Archive, Computer Software, Graphs, Interactive Video, Predictor Variables, Regression...

The concept of the general linear model (GLM) is illustrated and how canonical correlation analysis is the GLM is explained, using a heuristic data set to demonstrate how canonical correlation analysis subsumes various multivariate and univariate methods. The paper shows how each of these analyses produces a synthetic variable, like the Yhat variable in regression. Ultimately these synthetic variables are actually analyzed in all statistics, a fact that is important to researchers who want to...

Topics: ERIC Archive, Correlation, Heuristics, Multivariate Analysis, Regression (Statistics),...

Data from records of 99 patients were used to classify cardiac patients as to whether they were likely or unlikely to experience a subsequent morbid event after admission to a hospital. Both a linear discriminant function and a logistic regression equation were developed using a set of nine predictor variables that were chosen on the basis of their correlations with the likelihood of a subsequent morbid event. Once the models were obtained, artificially-generated missing values were replaced...

Topics: ERIC Archive, Classification, Heart Disorders, Patients, Predictor Variables, Regression...

The business of science is formulating generalizable insight. No one study, taken singly, establishes the basis for such insight. Meta-analysis, however, can be used to determine if results generalize and to estimate the mean and the variance of effect sizes across studies (J. Hunter and F. Schmidt, 1990). Meta-analysis inquiries treat studies (rather than people) as the units of analysis, and then use regression or other methods to determine the study features that explain or predict...

Topics: ERIC Archive, Comparative Analysis, Effect Size, Meta Analysis, Regression (Statistics), Moore,...

This Digest presents a discussion of the assumptions of multiple regression that is tailored to the practicing researcher. The focus is on the assumptions of multiple regression that are not robust to violation, and that researchers can deal with if violated. Assumptions of normality, linearity, reliability of measurement, and homoscedasticity are considered. Checking these assumptions carries significant benefits for the researcher, and making sure an analysis meets the associated assumptions...

Topics: ERIC Archive, Error of Measurement, Nonparametric Statistics, Regression (Statistics), Reliability,...

Background: An extensive body of researches has favored the use of regression over other parametric analyses that are based on OVA. In case of noteworthy regression results, researchers tend to explore magnitude of beta weights for the respective predictors. Purpose: The purpose of this paper is to examine both beta weights and structure coefficients in interpreting regression results. Data Collection and Analysis: Two heuristic examples will be illustrated. Findings: When predictor variables...

Topics: ERIC Archive, Researchers, Predictor Variables, Regression (Statistics), Comparative Analysis,...

Well-known numerical integration methods are applied to item response theory (IRT) with special emphasis on the estimation of the latent regression model of NAEP [National Assessment of Educational Progress]. An argument is made that the Gauss-Hermite rule enhanced with Cholesky decomposition and normal approximation of the response likelihood is a fast, precise, and reliable alternative for the numerical integration in NAEP and in IRT in general.

Topics: ERIC Archive, Item Response Theory, Computation, Regression (Statistics), National Competency...

Missing data occur in virtually every study. This paper reviews some of the various strategies for addressing this problem. The paper also provides instructional detail on two accessible ways of estimating missing data, both using the Statistical Package for the Social Sciences for Windows: (1) substitution of missing values with the variable mean of nonmissing scores; and (2) replacement of missing values with estimates derived from regression. Nine tables and five appendixes provide details...

Topics: ERIC Archive, Data Analysis, Estimation (Mathematics), Regression (Statistics), Research...

This paper identifies specific problems with stepwise regression, notes criticisms of stepwise methods by statisticians, suggests appropriate ways in which stepwise procedures can be used, and gives examples of how this can be done. Although the stepwise method has been routinely criticized by statisticians, it is still frequently used in the literature. This paper suggests research situations when stepwise regression may have a valuable function. Stepwise methods can be appropriate for...

Topics: ERIC Archive, Data Analysis, Predictor Variables, Regression (Statistics), Research Methodology,...